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(54) Emulated backup tape drive using data compression 

(57) A back up storage device (20) that backs up 
non-compressed data during a backup window period 
and then later after the backup window period is over 
and the device (20) is idle, it retrieves, compresses and 
then re-stores the data to reclaim space on the storage 
medium of the device (20). During operation, a duty cy- 
cle having a backup window period and an idle period 
is defined. When the back up window starts, data is 
down-loaded and stored on the device (20) in non-com- 
pressed form. When the idle period begins, the non- 
compressed data is retrieved, compressed and then re- 
stored on the device (20) to reclaim space on the stor- 
age medium of the device (20). In one embodiment of 
the invention, the back up storage device (20) is an em- 
ulated tape drive (20) that uses a software compression 
algorithm to compress the data stored in the device (20). 



r82 



DEFINE DUTY CYCLE 



^84 





START DUTY CYCLE 




* ,96 




BEGIN BACKUP 


♦ rea 




DOWNLOAD RAW DATA 


♦ r<*> 




COMPLETE DOWNLOAD 


* ,* 




BACKUP WINDOW EXPIRES 
IDLE PERIOD BEGINS 


\ ,94 




RETRIEVE RAW DATA 


^ ,96 




COMPRESS RAW DATA 


1 ,99 




STORE COMPRESSED 
DATA 





F/G._4 



Printed by Jouve, 75001 PARIS (FR) 



1 



EP 1 333 379 A2 



2 



Description 

FIELD OF THE INVENTION 

[0001 ] The present invention relates generally to back 
up data storage, and more specifically, to an emulated 
backup tape drive that stores non-compressed data dur- 
ing backup operations and then afterwards when the 
drive is idle, retrieves, compresses and then re-stores 
the data to reclaim space on the storage medium of the 
drive. 

BACKGROUND 

[0002] With the increasing popularity of Internet com- 
merce and network centric computing, businesses and 
other entities are becoming more and more reliant on 
information. Protecting critical data from loss due to sys- 
tem crashes, virus attacks and the like is therefore of 
primary importance. A well designed data protection 
program will generally have the ability to (i) instantly re- 
store data in the event of a disaster to enabled continued 
computing operations; (ii) re-store data over an extend- 
ed period of time (hours or days) without disrupting nor- 
mal computing operations; and (iii) archive copies of da- 
ta that are retrieved infrequently and with little urgency. 
Tape drives have long been a choice for storing archival 
back up data in information systems. 
[0003] Historically many such tape drives have used 
data compression to maximize the amount of data that 
can be stored on the tape. Tape, however, is a relatively 
slow and inefficient storage medium. Consequently em- 
ulated "tape" drives that use arrays of hard drives have 
become more popular recently. These emulated tape 
drives often rely on data compression to enable the stor- 
age of more data. The problem with current emulated 
tape drives is that the data compression is performed 
"on the fly" during the backup. In other words, compres- 
sion occurs in the critical path of the down loading of 
data, thereby impeding performance. The designers of 
emulated tape drive systems have therefore relied on 
expensive, high speed, hardware data compression so- 
lutions to achieve an acceptable level of performance. 
The use of slower, less expensive software compres- 
sion algorithms have not been a viable option in the past 
because of a lack of acceptable performance. 
[0004] An emulated backup tape drive that stores 
non-compressed data during backup operations and 
then afterwards when the drive is idle, retrieves, com- 
presses and then re-stores the data to reclaim space on 
the storage medium of the drive is therefore needed. 

SUMMARY 

[0005] To achieve the foregoing, and in accordance 
with the purpose of the present invention, a back up stor- 
age device is disclosed that stores non-compressed da- 
ta during backup operations and then afterwards when 



the device is idle, retrieves, compresses and re-stores 
the data to reclaim space on the storage medium of the 
device. During operation, a duty cycle having a backup 
window period and an idle period is defined. When back 

5 ups occur during the window, data is down-loaded and 
stored on the device in non-compressed form. Later dur- 
ing the idle period of the duty cycle, the non-compressed 
data is retrieved, compressed and re-stored to reclaim 
space on the storage medium of the device. Since the 

10 compression occurs when the back up device is idle, the 
rate at which data is backed up is not adversely effected 
in any way. Thus a low cost software data compression 
algorithm may be used. In one embodiment of the in- 
vention, the back up storage device is an emulated tape 

15 drive that uses an array of hard drives for the storage 
medium. In other embodiments, any type of storage me- 
dium can be used such as tape or semiconductor mem- 
ory chips for example. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

[0006] The invention, together with further advantag- 
es thereof, may best be understood by reference to the 
following description taken in conjunction with the ac- 
25 companying dnon-compressedings in which: 

Figure 1 is block diagram of an exemplary informa- 
tion infrastructure in which the emulated backup 
tape drive of the present invention may be used. 

30 

Figures 2A, 2B and 2C are diagrams of the emulat- 
ed tape drive of the present invention. 

Figure 3 is block diagram of a controller of the em- 
35 ulated tape drive of the present invention. 

Figure 4 is flow diagram illustrating a backup and 
compression duty cycle of the emulated tape drive 
of the present invention. 

40 

DESCRIPTION 

[0007] Referring to Figure 1, a block diagram of an 
exemplary information infrastructure in which the emu- 

45 iated backup tape drive of the present invention may be 
used is shown. The information infrastructure 10 in- 
cludes a plurality of clients 1 2 and a plurality of servers 
14 coupled together by a client network 16, a primary 
storage location 18, and one or more emulated tape 

so drives 20 coupled together by a network connection 22. 
The clients 12 can be any type of client such as but not 
limited to a personal computer, a "thin" client, a personal 
digital assistant, a web enabled appliance or a cell 
phone. The servers 14 may also be of any variety such 

55 as those based on the Unix, Linux, or the Microsoft Win- 
dows operating systems or a combination thereof. Like- 
wise, the client network 16 can be any type of network 
including but not limited to the Internet, a corporate in- 
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tranet, a wide area network, a local area network, a wire- 
less network, or any combination thereof. The primary 
storage location 1 8 can be arranged in a number of dif- 
ferent types of configurations, such as a storage array 
network (SAN), or network attached storage (NAS), or 
direct attached storage . In other embodiments, the pri- 
mary storage location 1 8 can reside in the chassis/cab- 
inet of the servers 14, stand alone storage devices, or 
a combination thereof. The connection 22 can be either 
direct (such as parallel SCSI or IDE) or a network topol- 
ogy such as fibre channel, Ethernet (fast, gigabit, or 10 
gigabit) for example. Also when multiple emulated tape 
devices 20 are used, they may be daisy-chained togeth- 
er to provide more backup capacity. 
[0008] Referring to Rgure 2A, a perspective view of 
an emulated tape drive 20 is shown. The emulated tape 
drive 20 includes a chassis 30, a power supply 32, a pair 
of fans 34, and an array of hard drives 36, all housed in 
the chassis 30. A user interface panel 38 is located at 
the front of the chassis 30. A back plate 40 is provided 
at the rear portion of the chassis 30. 
[0009] Referring to Figure 2B t an exploded perspec- 
tive view of the hard drives 36 according to one embod- 
iment of the present invention is shown. In this embod- 
iment, the hard drives 36 are configured in a pair of left 
and right rails (42L and 42R) respectively. Within each 
rail 42, five (5) columns of three (3) hard drives 36 are 
arranged in disk packs 44 respectively. It should be not- 
ed that this embodiment is only exemplary. According 
to various other embodiments of the present invention, 
the number of hard drives 36 may be arranged in any 
number of rails 42 (rows) and the number of hard drives 
36 (columns) per disk pack 44 may vary. 
[0010] Referringto Figure 2C, a view of the back plate 
40 of the chassis 30 is shown. The back plate 40 in- 
cludes vents for the fans 34 and a number of input/out- 
put ports 46. The input/output ports 46 are provided to 
conned a controller 48 (not visible because it is internal 
to the chassis 30) to the primary storage location 18 
through the network connection 22. For more details on 
the features and operation of the emulated tape drive 
20, see co-pending US application entitled "Storage 
System Utilizing An Active Subset of Drives During Data 
Storage And Retrieval Operations' by Thomas B. Bolt 
and Kevin C. Daly, attorney docket no. Q02-1037.US1, 
filed concurrently herewith incorporated herein by refer- 
ence for all purposes. 

[0011] Referring to Figure 3, a block diagram of one 
embodiment of the controller 48 configured for the the 
emulated tape drive 20 illustrated in Figures 2A-2C is 
shown. With this embodiment, the controller 48 includes 
a micro-controller 50, such as a microprocessor, config- 
ured to communicate with the hard drives 36 of the disk 
packs 44 through a USB controller 52, a USB hub 54 
and bridge circuit 56. For the sake of simplicity, these 
components are shown for only one disk pack 44. The 
remaining four disk packs 44 of the right rail 42R and all 
of the disk packs 44 of the left rail 42L communicate with 



the micro-controller 50 in a similar arrangement. In sit- 
uations where the network connection 22 is fiber chan- 
nel, the micro-controller 50 is connected to the primary 
storage location 18 through an optical transceiver 58 

5 and a fiber channel controller 60. Alternatively, when the 
network connection 22 is a Gigabit Fast ethernet con- 
nection, the micro-controller 50 is connected to the pri- 
mary storage location 1 8 through an ethernet transceiv- 
er 62 and an ethernet controller 64. It should be pointed 

10 out that these two connections are merely illustrative. In 
various embodiments of the present invention, multiple 
fiber channel ports and/or multiple ethernet channel 
ports, either alone or in any combination, can be provid- 
ed. Alternate interfaces such as parallel SCSI (Small 

is Computer System Interface) may also be substituted or 
used in conjunction with fire channel of Ethernet. Addi- 
tionally, alternate internal interconnect technologies 
such as fibre channel or parallel SCSI could be used 
instead of USB. 

20 [001 2] A system memory 66 and a non-volatile mem- 
ory 68 are also coupled to the micro-controller 50. In one 
embodiment of the invention, the system memory 66 is 
RAM and the non-volatile memory is Flash. The non- 
volatile memory is used for storing the micro-code used 

25 to program the micro-controller 50 as well as the com- 
pression/decompression software algorithms which are 
used by the emulated tape drive 22. It again should be 
noted that the circuit components of this diagram are 
merely illustrative of one embodiment of the present in- 

30 vention. Other embodiments would be readily apparent 
to those skilled in the art. For example, in embodiments 
with either more or fewer rails 42 and disk packs 44, 
additional or fewer USB controllers 52 and USB hubs 
54 would be required. Also the interface hardware be- 

35 tween the input/output ports 44 would be different if oth- 
er type of networking protocols besides fiber channel or 
ethernet were used. 

[0013] Referring to Figure 4, a flow diagram 80 illus- 
trating the operation of the emulated tape drive 20 is 

40 shown. Initially a system administrator or user of the in- 
formation infrastructure 1 0 defines a duty cycle (step 82) 
for the emulated tape drive 20. The duty cycle includes 
a backup window period and an idle period. Usually the 
backup window period is scheduled at a set time each 

45 day or at some other fixed time interval when the emu- 
lated tape drive 20 is at its lowest utilization. When back- 
ups are not occurring, the emulated tape drive 20 is idle. 
[0014] According to one embodiment, when the duty 
cycle starts (step 84), the backup of data begins (step 

so 86). The data is downloaded from the primary storage 
location 1 8 through the input/output ports 44, the micro- 
controller 50, the appropriate USB controller 52, USB 
hub 54, and bridge circuit 56 and stored (step 88) in non- 
compressed form on one of the hard drives 36. When 

55 the backup window period expires, the idle period be- 
gins (step 92). During the idle period, the non-com- 
pressed data stored on the hard drives 36 is retrieved 
(step 94) and provided to the micro-controller 50 through 
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the bridge circuit 56, USB hub 54, and USB controller 
52. The micro-controller 50 compresses the data (step 
96) and then re-stores it on the hard drives 36 (step 98) 
to reclaim space on the hard drives 36. When the idle 
period is over and next backup window begins, a new 5 
duty cycle begins (step 84) and the aforementioned 
steps are repeated. In various embodiments, any one 
of a variety of software compression algorithms may be 
used, such as a zip; a gnuzip; a bzip; a b2zip; a Lempil 
Ziv; and a LZS (Lempil Ziv Stac). Alternately, other com- 10 
press ion algorithms can be used. 
[001 5] When compressed data on the emulated tape 
drive 20 is needed, it is retrieved and provided to the 
micro-controller 50. The data is decompressed using 
the software algorithms stored in the Flash memory 68 
and provided to the primary storage location 1 8 through 
appropriate input/output port 44. Since compression al- 
gorithms are typically asymmetric, data decompression 
is not nearly as computationally intensive as compres- 
sion, the performance of the emulated tape drive 20 dur- 20 
ing data retrieval is not significantly degraded using a 
software solution. 

[0016] The present invention thus provides an emu- 
lated tape device used for the backup of archival data 
that uses a software based data compression algorithm. 25 
Since the compression occurs when the emulated tape 
drive 20 is idle, the rate at which data is stored during a 
backup operation is not adversely effected in anyway. 
[0017] It should be noted that the duty cycle could oc- 
cur any time there is a detection of inactivity on the pri- 30 
mary data interface 46. After some time period of inac- 
tivity (for example, 20 minutes), the system could begin 
retrieving and compressing data. This process can be 
interrupted at any point if activity is detected on the pri- 
mary data interface 46. It is acceptable for data to be 35 
partially compressed, and it is possible to restart the 
compression from the point at which it was previously 
suspended. 

[001 8] Although the foregoing invention has been de- 
scribed in some detail for purposes of clarity of under- 40 
standing, it will be apparent that certain changes and 
modifications may be practiced within the scope of the 
appended claims. For instance, the present invention 
can be ready practice with any type of read-write storage 
medium such as magnetic tape, silicon memory chips, 
devices such as SRAM, DRAM, Flash, EPROM, EUV- 
PROM, EEPROM, etc. The described embodiments 
should therefore be taken as illustrative and not restric- 
tive, and the invention should not be limited to the details 
given herein but should be defined by the following so 
claims and their full scope of equivalents. 

Claims 

55 

1 . A method, comprising: 

defining a duty cycle for the downloading of da- 



ta to a backup storage device, the duty cycle 
having a backup window period and an idle pe- 
riod; 

receiving data during the backup window peri- 
od; 

storing the data on the backup storage device 
during the backup window period; 

retrieving the data stored on the backup stor- 
age device during the idle period after the back- 
up window period; 

compressing the data retrieved from the back- 
up storage device during the idle period; and 

re-storing the data compressed during the idle 
period in compressed form on the backup stor- 
age device the idle periods of to reclaim space 
on the storage device. 

2. The method of claim 1 , wherein the compression of 
data is performed using a software data compres- 
sion algorithm. 

3. The method of claim 2, wherein the software data 
compression algorithm includes one of the following 
types of algorithms: a zip; a gnuzip; a bzip; a b2zip; 
a Lempil Ziv; and a LZS (Lempil Ziv Stac). 

4. The method of claim 1 , further comprising succes- 
sively repeating the receiving and storing of data 
during the backup window periods and retrieving, 
compressing and storing compressed data on the 
backup storage device during successive duty cy- 
cles respectively, 

5. The method of claim 1 , wherein the backup storage 
device is an emulated tape drive containing an array 
of hard drives. 

6. The method of claim 1 , wherein the data is down- 
loaded over a network from a primary storage loca- 
tion. 

7. The method of claim 6, wherein the data is down- 
loaded over a fiber-channel connection between 
the primary storage location and the backup stor- 
age device. 

8. The method of claim 6, wherein the data is down- 
loaded over an ethernet connection between the 
primary storage location and the backup storage 
device. 

9. The method of claim 6, wherein the primary storage 
location and the backup storage device are part of 
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a storage array network. 

10. The method of claim 6 t wherein the primary storage 
location and the backup storage device are part of 
a network attached storage configuration. 

11 . The method of claim 1 , wherein the backup storage 
device is directly electrically connected to a server. 

12. An apparatus comprising: 

a backup storage device comprising: 
an input/output port; 

an array of hard drives configured as back- 
up storage; and 

a controller configured to download data 
received from the input/output port to the 
array of hard drives during a backup period 
and then reclaim storage space on the ar- 
ray of hard drives during an idle period fol- 
lowing the backup period by retrieving the 
data stored on the array of hard drives, 
compressing the retrieved data, and then 
re-storing the compressed data on the ar- 
ray of hard drives. 

13. The apparatus of claim 1 2, wherein the controller is 
further configured to execute a software algorithm 
to compress the retrieved data. 

1 4. The apparatus of claim 1 3, wherein the software al- 
gorithm includes one of the following types of algo- 
rithms a zip; a gnuzip; a bzip; a b2zip; a Lempil Ziv; 
and a LZS (Lempil Ziv Stac). 

15. The apparatus of claim 13, wherein the software al- 
gorithm is stored in a memory associated with the 
controller. 



a primary storage location coupled to the back- 
up storage device through a network connec- 
tion. 

5 20. The apparatus of claim 19, wherein the network 
connection is one of the following types of network 
connections: fiber channel or ethernet. 

21. The apparatus of daim 19, wherein the primary 
io storage location and the backup storage device are 

arranged in one of the following: a storage attached 
network or network attached storage configuration. 

22. The apparatus of claim 1 9, further comprising a plu- 
'5 ralrty of clients and servers coupled to the primary 

storage location through a client network. 
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16. The apparatus of claim 12, further comprising a fib- 
er channel controller coupled between the control- 
ler and the input/output port which comprises an op- 45 
tical transceiver. 

17. The apparatus of claim 12, further comprising an 
ethernet controller coupled between the controller 
and the input/output port which comprises an eth- 50 
emet transceiver 

1 8. The apparatus of claim 1 2, wherein the array of hard 
drives configured as backup storage further com- 
prises a network hub and bridge circuit coupled be- 55 
tween the array of hard drives and the controller. 

19. The apparatus of claim 12, further comprising: 
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